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(5) Method of storing and retrieving image data. 
(57) In order to utilize memo data written by hand at the time 
of- retrieving the document, the original document mwage 
data is stored and, thereafter, a document cmfi^e data witb 
memo is separately input, the document iaae^ data with 
memo being comprised of the original documept image onto 
which is additionally written memo data by hand. The 
position of the document image data with memo is then 
brought into afignment with the ongixral decuTnent image 
data and ts coHated to extract only those msanwer data wntten 
by hand. Then the krnd of memo is discrimmated, the 



process is effected depending upon the kind of memo, and 
the memo data is stored in the secondary data file for 
retrieval- To retrieve the document, the memo data of a 
pluranty of documents are read out from the secondary data 
file depending upon the kind of memo that is designated, 
and are displayed as a look-up table. If the operator 
designates any one of them, the image of the correspondmg 
original document is read out from the original document 
image file and is displayed. 
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METHOD OF STORING AND RETRIEVING IMAGE DATA 
Background of the Invention 
Field of the Invention 

The present invention relates to a method of 
storing and retrieving image data* More specifically, 
the invention relates to a method of storing secondary 
5 data such as memos added to document image as index 
data of document image and to a method of retrieving 
document image by utilizing the secondary data. 
Description of the Prior Art 

Accompanying the recent trend toward putting 
10 into practice the optical disc device which is capable 
of storing large amounts of data, attention has- 
been given to a document image filing system as 
a new document control means to electronically file 
the document data and to retrieve the data by using 
15 a display device. If the content of a document 

is treated as an image, the document which includes 
figures and photographs in addition to characters 
can be electronically filed. Therefore, a wide 
range of documents such as general literatures, 
20 books and slips, design drawings, written contracts, 
and the like can be stored in a memory device. 

According to a conventional retrieval system, 
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index data such as names of the documents, classifica- 
tion codes, keywords and the like are registered 
through the keys to being correspond to the document 
images. To retrieve the data, a user designates 
these index data so that the contents of the corres- 
ponding document are produced on the display device. 
A system of this type has been disclosed, for example, 
in a Journal "NIKKEI COMPUTER" published by Nikkei 
McGrowhill Co., December 26, 1985, PP- 62-64. 
Since items common to each of the documents have 
been selected, the above-mentioned retrieval data 
are not helpful to directly retrieve a document 
that is being sought if the user does not remember 
the name of -the literature or the keyword or if 
he remembers them vaguely. In this case, the document 
images are displayed successively to retrieve the., 
correct one by his eyes. 

As a method to facilitate the retrieval of 
the document image, the applicant of the present 
invention has proposed a method of retrieving image 
in Japanese Patent Application No. 55073/1983 (Japanese 
Patent Laid-Open No. 183458/198'^, U.S. Serial No. 
594690) according to which secondary reference data 
such as memos specific to the image data are registered, 
i and the image data is specified with reference to 
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the thus registered data at the time of retrieval. 
Summary of the Invention 

The object of the present invention is to provide 
a method of storing secondary 

data such as memos which the user has added to the 
document, and a method of retrieving the image data 
by utilizing the secondary data. 

Anot/ier object of the present invention is 
to provide a method of storing image data which 
is capable of storing a given region of the document 
image as an index for retrieving the document image, 
and a method of retrieving the document image by 
utilizing the index. 

To achieve the above-mentioned objects, the 
15 method of storing document image data in a filing 
system of the present Invention comprises: 

a first step for preparing a processed image 
which includes the content of an original f. 
document in a first memory region of the filing 
20 system and a second data; 

a se.cond step for comparing the processed 
image with. -the image of the original document read from 
said first memory region, in order to find 
different portions; 
25 a third step for classifying the different 

portions according to a predetermined classification 
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standard; 

a fourth step for specifying at least one local 
region which has a predetermined relationship in 
position, that is determined by the classification 
5 standard, with respect to the different portion 
in the processed image; and 

a fifth step for storing the data that represents 
the local region in a second memory region of the 
filing system together with a code that makes said 
10 data corresponded to the original document. 

The secondary data may include memos written by 
the user on the document, and/or marks such as underlines 
and boxes or surrounding marks attached to particular 
words and description in the document. The data 
15 that represents the local region includes position 
coordinates of the local region. 'document image 
in the local region, and character codes obtained 
by discriminating the characters contained in the 
local region. The data that represent the local 
regions are rearranged for every classification 
section of the secondary data, and are stored in 
the second memory region. 

According to the method of retrieving document 
image of the present invention, the user designates 
25 a classification section of the secondary data, 
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so that the contents of local region in the document 
image or the contents of memo data corresponding 
to the underlined portion are displayed in the form 
of a look-up table or list on the display device, 
and the user then selects one of the secondary data 
that are displayed » Through the select operation, 
a document iniage corresponding to the selected secondary 
data is read from the first memory region and is 
displayed . 

According to the present invention, the user 
inputs the processed document to easily prepare 
a secondary data file that corresponds to the document 
which has been registered in the filing system. 
The image in most portion of the processed document 
has been stored in the file of the original document. 
Therefore, in the secondary data file needs be stored 
the information added by the user or the image 
of a portion of the original document related thereto. 
Therefore, the secondary data file may be made up 
of a personal file such as a floppy disc having 
a small memory capacity. According to the present 
invention, a plurality of users are allowed to have 
their own secondary data files without changing 
the contents of the common file in which are stored 
images of the original docunu^.nts , and are hence 
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allowed to quickly take out desired document images 
from the common image file with reference to their 
own memo data stored in the secondary data file. 

The foregoing and other objects, advantages, 
manner of operation and novel features of the present 
invention will be understood from the following 
detailed description when read in connection with 
the accompanying drawings . . 
Rflef Description of t he Drawings 

Fig. 1 is a block diagram illustrating an image 
processing system according to an embodiment of 
the present invention; 

Fig. 2 is a diagram which schematically illust- 
-■ rates a process for extracting the secondary data 
15 from the processed document; 

Fig. 3 is a program flow chart illustrating 
the procedure for finding straight lines contained 

in an image ; 

Fig. 4 is a diagram explaining how to detect 

straight lines in relation to Fig. 3; 

Fig. 5 is a diagram showing another embodiment 
of a rectangular frame for correcting the skew; 

Fig. 6 is a diagram explaining how to extract 
the difference between an original document image 
25 13 and a processed image 17; 
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Fig. 7 is a diagram explaining the classification 
of the secondary data or the memo data; 

Fig. 8 is a data table used for the classifica- 
tion of the secondary data; 

Fig. 9 is a program flow chart illustrating 
the procedure for classifying the secondary data; 

Figs. lOA and lOB are diagrams illustrating 
a method of extracting a local region designated 
by an underline; 

Fig. 11 is a diagram showing a format of a 
secondary data file; 

Fig. 12 is a diagram showing the contents of 
display in retrieving the image according to the 
present invention; 

Fig. 13 is a program flow chart which schematically 
Illustrates the whole functions of the image processing 
system according to the present invention; and 

Fig. 14 is a diagram showing a modified form 
of the original document used for writing the secondary 
data. 

Description of the P referred Embodiments 

The invention will now be described by way 
of embodiments . 

Fig. 1 shows a system for retrieving document 
image data to which the present invention is adapted, 
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wherein reference numeral 1 denotes a microprocessor 
(CPU) , 2 denotes a main memory (MM) , 3 denotes a 
keyboard (KB), 4 denotes an image scanner (IS), 
5 and 5' denote filing devices (FD) , 6 denotes a 
bit map memory (BM) , 7 denotes an image processor 
(IP), 8 denotes a display controller (DC). 9 denotes 
a CRT display, and 10 denotes a printer (PR) t 

First, described below is the processing for 
extracting the secondary data (memo portion) only 
from the processed image (hereinafter referred to 
as image with memo) to which secondary data such 
as memo is added. Fig. 2 is a diagram which schemati- 
cally illustrates the process for extracting the 
secondary data. First, an original document 11 is taken 
from the image scanner 4 into the bit map memory 
6 to obtain an original image 12. The original 
image 12 is stored in the file 5 which consists, 
for example, of an optical disc. In order to align 
the positions of two images that will be described 
later, the image processor 7 writes a rectangular 
frame FR at a predetermined position on the original 
image 12 depending upon the instruction from the 
CPU 1 to thereby prepare an image 13 with frame 
which will be produced by the printer 10 as an original 
i paper l4 for processing the document. A memo (e.g., 
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underline) is added onto the original paper 14 to 
prepare a processed document 15 with memo. The 
processed document 15 is read by the image scanner 
4 and is input as an image with memo. 16 to a region 
5 different from the region where the image 13 with 
frame has been stored on the bit map memory 6. 
With an ordinary image scanner, it is difficult 
to completely convert the paper surface into an 
image thereof without rotation or skew. The printer 10 and 

10 the image scanner 4 usually have different picture 
element densities. In order to bring the skew and 
size of the image 16 with memo into agreement with 
those of the image 13 with frame, therefore, the 
normalization is effected by using, for example, 

15 the rectangular frame FR to obtain a normalized 
image 17- Then, the image 13 with frame and the 
normalized image 17 are matched with each other, 
to obtain a differential image 18 in which are left 
non-coincident portions only. The differential 

20 image contains, in addition to memo data, deteriora- 
tion data of the original image caused by passing 
the image 13 with frame through the processor 10 
and the image scanner ^. Finally, the deterioration 
data is removed from the differential image 18 to 

25 obtain a memo image 19 which contains memo data 



- 10 - 



0:202671 



only. The above-mentioned processing is wholly 
controlled by the CPU 1, and the individual image 
processings are performed by the image processor 
7 according to the instruction from the CPU 1. 
5 The image processings by the image processor 7 will 
now be described in detail. 

When the image 13 with frame is to be prepared 
from the original image 12, the rectangular frame 
PR that serves as an indication of reference position 
10 is described at a position maintaining a predetermined 
distance "a" from the edge of the original image 12. 
The rectangular frraae FR may be replaced by other 
mark that indicates the position. To draw straight 
. lines of the rectangular frame FR on the bit map 
15 memory 6, the element patterns of the lines should 

be written successively in the up and down direction 
and in the right and left direction. In the' write 
processing of the rectangular frame, black picture 
elements of the regions outside the frame FR on 
20 the original image 12 are all converted into white 

picture elements so that the normalization processing 
can be carried out conveniently as will be described 
later. 

In the processing for preparing the normalized 
25 image 17 from the image 16 with memo, a step is 
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carried out to find the skew of the image 16 with 
memo by detecting four straight lines that constitute 
the rectangular frame FR, and a step is further 
carried out to convert the coordinates of the whole 
5 image in order to correct the skew of the image 
16 with memo and to correct, depending upon the 
cases, the size thereof, such that the rectangular 
frame of the image 16 with memo and the rectangular 
frame of the image 13 with frame are brought into 

10 agreement with each other. 

Straight lines constituting the rectangular 
frame FR can be detected by a variety of known methods. 
As one of such methods, sue is made here of a known 
algorithm of Hough conversion. 

15 Fig. 3 is a program flow chart for detecting 

a vertical line located on the left side among the 
four straight lines of the rectangular frame FR, 
and Fig, 4 is a diagram to explain the detection 
of the line. In Fig. 4, = x sin ^ + y cos ^9 

20 is an equation of the straight line that is to be 

found, wherein denotes a distance from the origin 
0, and "9" denotes a skew of the straight line. The 
feature of Hough conversion is that the straight 
lines can be detected irrespective of partial noise 

25 data in the image. The outline of this algorithm 
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is as described below. That is, in the flow chart 
of Fig. 3, a candidate of a point such as a black 
picture element is found on a straight line . in 
the steps 23 to 25- In the steps 26 to 30, sets 
5 of Y and 9 according to the equation ^ = x sin 9 + 

y cos 9 are found as straight lines that pass through 
this point P,. Then, the steps 23 to 32 are repeated 
to find a frequency distribution f(^, 9). Here, 
f(jr, 9) represents the number of candidate points 
10 located on the straight line y = x sin 9 . y cos 9. 

in Fig. 4, for instance, the value f ^ . 9) of straight 
line &, '9) passing through points P^ to P^^, is 
As for other straight lines U. 9), the value f(^, 9) 
- .. is as small as 0 to 2. By finding y and 9 that 
^5 render the value f(Y, 9) maximum in the step 33, 
therefore, there can be obtained a parameter of 
straight lines that pass through the- greatest number 
of candidate points. Equations of • the right , upper 
and lower straight lines of the rectangular frame 
20 PR are then found in the same manner as described 
above . 

Left upper, right upper, left lower and- right 
lower corner points of the rectangular frame FR 
are found from the intersecting points of thes-.. 
25 four straight lines, and are denoted as (0, 0), 
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(M^, Nj), {M2, N^), (M^ + M^, + to form a 

new coordinate system with the left upper corner 
point as the origin. Described below is a step 

for converting the whole image 16 with memo by utili- 
zing the rectanguiar* frame FR. If four corner points 

of the image I3 with frame are denoted by (0, 0), 

(m, 0), (0, n) and (m, n) , the conversion from the 

image 16 into the image I7 can be expressed as, 

Y^' ^^/m N^/n/ \y 
where-; (x, y) represents the coordinate of a 
picture element of the image 16 with memo, 
and (X, Y). represents the coordinate of a picture 
element that corresponds to (x, y) in the norma- 
lized image I7. 
15 A coordinate (x, y) corresponding to a lattice 

point (X, Y) is found according to an equation, 
^\ r -mrU\ / X 



10 



20 




M.N - M N. 

^ \-nN^ nM^/ ^Y- 

Generally, however, the coordinate (x, y) does 

not serve as a lattice point on which a picture 

element exists. Therefore, fraction over 1/2 is 

counted as one and the rest is disregarded to find 
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an integer in order to use a value of lattice point 
which is closest thereto, or the logical sum of concen- 
trations of the surrounding lattice points is inter- 
polated to find a concentration that is to be used 
5 as the concentration at (x. y) . In the foregoing 
was described an embodiment to utilize the outer 
frame constituted by four lines as a mark for aligning 
the position. However, methods can further be contrived 
to achieve the matching by attaching characteristic 
10 points to the four corners, and to achieve the matching 
relying upon the characteristic points of the original 
document without adding any particular marks. In 
the foregoing was described the case based on the 
prerequisite that the distortion caused by the difference 
15 in the picture element density between the printer 

10 and the image scanner 4 was linear. With an apparatus 
of the type in which a line sensor of the imkge 
scanner ^ is driven by a motor, however, there may 
develop non-linear distortion being caused by the 
20 drive speed which is not constant. In this case, 
use is made of a sectionalized frame FR' as shown 
in Fig. 5, change in the distance is detected relying 
upon a plurality of parallel lines" forming the sections, 
and the aforementioned conversion is performed v.ith 
25 a small region as a unit, in order to obtain a 
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normalized image maintaining a high precision. 

Described below is a process for preparing a 
differential image 18 from the image 13 with frame 
and the normalized image 17. The differential image 
5 can be easily prepared by comparing the image 13 
having frame with the normalized image 1? with a 
picture element as a unit, and rendering the non- 
coincident portions to be black and the coincident 
portions to be white. To distinguish the memo data 

10 over the noise, however, the procedure should be 
carried out as described below « 

In Fig. 6, reference numeral 13 denotes an image 
with frame having a pattern SI, 17 denotes a normalized 
image having a pattern SI that is deformed by noise 

15 and having a pattern S2 added as memo data, 13' denotes 
an image 13* with frame which is processed to expand 
the pattern SI, 18 denotes a differential image obtained 
from the images 13' and 17, and 18' denotes a differen- 
tial image obtained from the images 13 and 17. Now, 

20 if the concentrations of given picture elements (x, y) 
of the image 13 with frame and of the normalized 
image 17 are denoted by f(x, y) and g(x, y) , the 
image being supposed to be binary image, the concentra- 
tion of the black picture element being denoted by 

25 "1" and the concentration of the white picture element 
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being denoted by "0", then the image 13' is obtained 
by performing the operation, 

f(x, y)=r(x, y)vr(x+l, y)Vf (x, - y+-l)Vf (x-^l,y+l) 
for all X and y. 

By using the image 13' with frame in vhich the 
black picture element region is expanded as mentioned 
above, if the region of black picture elements is 
found in the normalized image 17 and the region of 
white picture elements is found in the expanded image 
13*, there is obtained the differential image l8 
which contains memo data S2 only- The above processing 
is expressed by the following equation, 

h(x, y) = g(x, y)Af'(x, y) 

As will be obvious from the comparison of the 
image 18 with the differential image 18' which 
indicates noncoincident portions between the image 13 
and the normalized image 17, the expansion processing makes 
it possible to make the differential image free from 
noise region S3 added to the initial pattern SI or 
the portions missing from the initial, pattern, that 
are cauaed as the image is passed through the printer 
10 and the image scanner 4. 

In the .above embodiment, the expansion processing 
was performed based upon the logical sum of the neighbor- 
ing four picture elements. It. is, however, allowable 
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to expand the logical sum to include the neighboring 
nine picture elements, the neighboring 16 picture 
elements, and so on. Depending upon the kind of 
memo to be treated, furthermore, the expansion processing 
5 can be eliminated. 

A process for preparing a memo image 19 from 
the differential image 18 will now be described. 
This process is to remove noise from the memo data 
that has noise which could not be removed by the 

10 above-mentioned process. Here, it is presumed that 
the noise has a line width smaller than that of the 
memo data, and the noise is removed by the contraction 
conversion, and the line width of the memo data is 
restored by the expansion conversion. The differential 

15 image 18 is denoted by h(x, y) . First, the black 
region of the differential image 18 is contracted. 
The contraction is realized by effecting the operation. 

h'(x,y)=h(x,y)A h(x-.-l,y)A h(x .y+1) A h(x+l ,y+l) 
for all x and y. Then, the black region of the con- 

20 traction image h'(x, y) is expanded in the same manner 
as the aforementioned f ' (x, y) . Depending upon the 
line width of noise, the expansion may be effected 
to include the neighboring nine picture elements, 
the neighboring 16 picture elemencs, and so on, instead 

25 of the logical sum or the logical product of the 
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neighboring four picture elements. In the process 
to find the differential image, it often happens 
that the memo data is cut off due to the process 
for expanding the image 13 with frame. This defect, 
however, can be interpolated by effecting the expansion 
5 conversion and then the contraction conversion prior 

to effecting the above-mentioned process of contraction 

and expansion. 

Described below is a process which discriminates 
the kind of memo data to prepare a secondary data 
10 file depending upon the kind of memo data- 
Fig. 7 shows a relationship among a processed 
image l6 to. which memo data is added, wherein X 
. represents arfeitrar^ characters, a memo image 19, 
and a divided-region image 20 in which a circumscribing 
15 rectangle is found for each region of continuous • 
black picture elements in the memo image 19- Here, 
the memo data includes three types of data, i.e., 
underline Ml, box or surrounding mark 142. and notes 
M3. In addition to these memo data, the memo image 
20 19 contains noise N that was not removed from the 

differential image by the conversion into memo image 
19. The noise component N is discriminated and removed 
by the step of classifying the memo data that will 
be described later. 
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First, described below is a process for preparing 
the divided-region image 20 from the memo image 19. 
Here, the image is divided into units of regions 
of continuous black picture elements- Memo number 
5 is attached to each of the regions. A variety of 
algorithms have heretofore been proposed to cut out 
the continuous region. The continuous region can 
be cut out, for example, by the labelling algorithm 
disclosed in a paper entitled "Pattern Data Processing" 

10 by Makoto Nagao, the Japanese Association of Electronic 
Communications, I983, p. 84. Labels 1 to 6 attached 
to the individual regions are corresponded as memo 
numbers to a clolumn 40 of region table TBI as shown 
in Fig. 8. The heights of the regions are calculated 

15 from the coordinates of the uppermost and lowermost 

portions of the regions, and are written onto a column 
41 of the table. The widths are also calculated 
from the coordinates of the leftmost and 
rightmost portions, and are written onto a column 42. 

20 The region table TBI is prepared in a work area 

in the main memory 2 . 

The memory data are classified by calculating 
a variety of parameters based upon the data of circum- 
scribing rectangles found by dividing the region, 

25 and comparing them with predetermined classification 
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standards. In this example, there are three classifi- 
cation parameters that are stored in the columns 
43 to 45 of the table TBI. 

A first parameter stored in the column 43 is 
5 defined by the width/height of the region and represents 
a ratio of the height to the width of the region. 
In the case of the "underline", the value of the 
first parameter becomes greater than that of other 
memo data. Depending upon the value of the first 
10 parameter, therefore, the underline can be discrimi- 
nated from other memo data. A second parameter stored 
in the column ^^^^ is defined by the width, plus height, 
and represents the size of the region. In the case 
of the noise. -the value of the second parameter becomes 
15 smaller than that of other memo data. Therefore, 
the noise can be discriminated from the memos. A 
third parameter stored in the column 45 represents 
a ratio of black picture elements that occupy the 
area of the region in the original image 12 at a 
20 position that corresponds to the region. The box 

or surrounding mark contains the original image that 
exists in the corresponding region. Therefore, the 
third parameter has a large value; i.e., the value 
is small in the case of other memo data. Namely, 
25 the box or surrounding mark can be discriminated 
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from other memo data. Fig. 9 is a flow chart of 
a program for classifying the memos using the above- 
mentioned parameters, wherein Q^^^ to.Q^^^ denote 
threshold values of "the first to third parameters. 
5 In the foregoing description, only three kinds of 
memo data were taken into consideration, i.e., 
underline, box or surrounding mark, and notes. 
Depending upon the kinds of memos to be employed, 
however, other discrimination parameters should 

10 also be taken into consideration. For instance, 
areas and space frequencies can be utilized. The 
results of classification are stored in the column 
46 of the table TBI. 

The secondary data file for retrieving the 

15 document image is then prepared based upon the memo 
data that are classified as described above. 

For instance, the underline Ml is presumed 
to be a sign that is drawn under the keywords in 
a sentence, and a train of characters above the 

20 underline in the document image is cut out so as 
to be used as a retrieval data. To cut the train 
of characters designated by the underline, a rectangular 
region 83 having, as a base, the base 82 of a circum- 
scribing rectangle 8I of the underline Ml and having 

25 a predetermined height H, is set in a processing 
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region on the original image 12 or on the image 
13 with frame as shown in Fig. lOA. The image in 
the rectangular region is then projected in the 
lateral direction as shown in Fig. lOB in order 
5 to find a distribution 84 of black picture elements. 
From this distribution, a boundary 85 of character 
train 86 of the lowermost line in the rectangular 
region 83 can be found, thereby to obtain the position 
and size of the local region where the character 

10 train 86 exists. 

The box or surrounding mark M2 can be treated 
as a sign that represents the number of an important 
drawing quoted m the. document. In this case, 
■ Characters in a local region in. the original image 
,5 specified by the circumscribing rectangle of the 

tox or surrounding mark are recognised. The recogni- 
tion can be performed by adapting a variety of algorithm: 
that have heretofore been used with the existing 
OCR apparatus, or can simply be performed by the 
20 method of pattern matching disclosed in the aforement- 
ioned literature compiled by the Japanese Association 
of Electronic Communications, p. 98. The recognized 
result IS used as a retrieval data together «lth 
a pointer of from the sentence to the drawing number, 
25 as has been disclosed, for example, in Japan, >i 
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Patent Application No. 273460/1984 entitled a system 
for retrieving document image data filed by the 
same applicant as the present application. 

The note M3 is extracted as a circumscribing 
rectangle on the divided-region table TBI with a 
character as a unit. In order to collect a series 
of characters into a local region, therefore, the 
neighboring rectangular regions are collected together. 
This process is realized by expanding the individual 
circumscribing rectangles at a predetermined ratio, 
and collecting the regions that are overlapped into 



one 



Fig. 11 shows a data format of the thus extracted 
secondary data file 90, wherein reference numeral 
15 91 denotes a code data such as image number that 

serves as a pointer to the original image, and reference 
numerals 92, 93, and 9H denotes columns for storing 
the secondary data that correspond to the note M3, 
underline Ml, and box or surrounding mark M2. A 
coordinate (Xq, y^) represents a position at the 
left upper corner of a circumscribing rectangle 
of the memo data, a coordinate (x^, y^ ) represents 
a position at the right lower corner of the same 
rectangle, and P denotes a pointer to the note image. 
The note image does not exist in the file of the 
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original image but is obtained from the processed 
image, and is stored in a separate memory region 
in the filing apparatus 5 in which the 'original 
image has been stored, or is stored in a separate 
5 memory region in the filing apparatus 5* which forms 
the secondary data file 90. A coordinate (X2, 
represents a position at the left upper corner of 
a character train rectangle 86, a coordinate (X3, y3) 
represents a position at the right lower corner 
10 of the same rectangle, a character code 95 is obtained 
by recognizing a character in the character, train 
specified by the box or surrounding mark M2 , and 
a column 96 is an area for storing pointers that 
indicate the correspondence to the drawing images 
15 designated by the box or surrounding mark M2. In 
a stage where the drawing number specified by the 
box or surrounding mark M2 is recognized from the 
document image, the column 96 of pointers remains 
blank.. Namely, the image data that have been stored 
20 are searched successively, and the image number 

is stored as a pointer in this column 96 at a moment 
when the image of the corresponding drawing is found. 

The data in the secondary data file 90 consist 
chiefly of coordinates that specify the local region 
2 5 in the original image. When the document image 

is to be retrieved, the local region is extracted 
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from the document image based upon the coordinate 
data to display the contents thereof. Since the 
amounts of secondary data corresponding to the document 
images are not so large, the secondary data file 
5 90 may be comprised of a magnetic memory device 
having a relatively small memory capacity, i.e., 
may be comprised of a filing device 5' such as a 
floppy disc, that is separate from the filing device 
5 of a large capacity such as an optical disc which 

10 stores document images. The secondary data may 

be e^tored in the file 5' consisting' of an optical 
disc, as a matter of course* In the above-mentioned 
embodiment, furthermore, the local region designated 
by the underline Ml is stored by way of position 

15 coordinates, and the contents of the local region 
are extracted from tKe file of original images. 
This, however, may be so modified that a keyword 
contained in each of the local regions is stored 
as a character code in the column 93 of the secondary 

20 data file, and is used for attaining the matching 
with respect to a designated keyword at the time 
of retrieving the image. In the above description, 
the underline was cut out mark, and the box or surrounding 
mark was recognition mark. It is, however, also a llov; 1 p 

25 to define the underline as a recognition mark and the • 
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or surrounding mark as a cut out mark. Further, in the fo 
was described to recognize the drawing number. 
However, it is also allowable to recognize the liter- 
ature number, to cut out the name of the corresponding 
literature from the end of the literature, in order 
to store and display it. It is further possible 
to provide a correcting function relying upon the 
interactive processing to cope with the situations 
where the classified results are not correct. 

Described below is a method of retrieving the 
document image according to the present invention 
by utilizing the contents of the above-mentioned 
secondary data file. 

Fig. 12 shows 'a retrieval screen displayed 
on the CRT 9 in retrieving the document image, wherein 
reference numeral 50 denotes an ordinary retrieval 
screen which depicts the results of when a classifi- 
cation code is designated. If the user requests 
the display of underlined portion under the condition 
where the data such as the name of document or its 
number has not been clearly stored, the underlined 
portion only is displayed with a document as a unit 
as designated at 51 in Fig, 12. Further, if the 
user requests to display the drawings or the notes, 
the contents are displayed as designated at 52 or 
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53 in Flgo 12. The drawing numbers have been recognized 
and corresponded to the description. Therefore, 
the retrieval system of the aforementioned application 
that has been filed already can be utilized to compare 
the description with the drawings after the retrieval 
to display them. It is allowable to display, at 
the time of retrieval, a plurality kinds of memos 
in the form of a list as a matter of course. 

Fig. 13 shows a whole procedure for storing 
and retrieving the document image data executed 
by the document image processing system of Fig. 1. 
The image processing can be roughly divided into 
four processes that are executed depending upon 
the command inputs (step 100) sent from the keyboard 3. 

A first process is to store a document image 
through steps 110 to 118. In the step 110, the 
image of the original document is input from the 
image scanner 4 to a predetermined area in the bit 
map memory 6, and the input image is displayed on 
the CRT 9 (step 112). If the input image is not 
perfect, and the operator instructs to input the 
image again, the process returns to the step 110. 
The operator who has confirmed the image quality 
then inputs the document number corresponding to 
the document, classification code and keyword (step 
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116) using the keyboard 3- The input document image 
is 'therefore stored in the filing device 5 together 

with these codes. 

A second process is to print out the original paper 
for a processed doc\amant through steps 120 to 132. 
As the document number or the Keyword is input (step 
120) through the keyboard 3, a corresponding document 
image is retrieved (step 122) out of the documents 
stored in the filing device 5, read into the bit 
map memory 6 and is displayed on the CRT 9 (step 
12*^). To the document image is added a rectangular 
frame FR (step 126) to indicate the reference position 
mentioned earlier. The region surrounding the rectan- 
gular frame is cleared (step 128), and the document 
image is sent as an image with frame to the CRT 
9 and the printer .10 (steps 130 to 132). 

A third process is to store the secondary data 
through steps l40 to l62. In this process, a processed 
document consisting of the document with frame produced 
previously onto which is writen memo data, is input 
from the image scanner 4 (step 140) . and whereby 
the skew is corrected and, as required, the size 
is corrected (step 142). Next, as the operator 
inputs the document number or the keyword of the 
document, the corresponding original image is retrieved 
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from the filing device 5 and is displayed (steps 
144 to 146). If the memo data and the orisinal 
correspond to each other, the memo data are extracted 
and classified through steps I50 to I56, and the 
results are displayed. If the operator who has 
confirmed the displayed contents inputs an OK sign, 
the memo data are stored in the aforementioned secondary 
data file 5' (step 162). If the displayed contents 
are not correct, the error is corrected at a step 
160, and the corrected result is stored in the secon- 
dary data file. 

A fourth process is to retrieve the document 
image through steps 170 to I76. As the operator 
designates the classification code of memo data, 
15 the document data file 5 is accessed in accordance 
with the data of a corresponding classification 
section in the secondary data file 5'. For example, 
the image of a local region in the document corres- 
ponding to the underline is displayed in the form 
of a list (step 172). When the "note" is designated, 
the image of note characters are read out from a 
separate region of the secondary data file 5', and 
is displayed in the form of a list. If the operator 
selects any one of the memory data (step 174) with 
reference to the contents of the list that is displayed. 
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the document image of a corresponding image number 
is retrieved from the file 5, and is displayed on 
the CRT (steps 176 and 178). 

In the above -mentioned flow chart, the operator 
inputs the image number of the original image in 
storing the secondary data. This, however, may 
be eliminated by automatically outputting the image 
number 91' to the processed document in the second 
process as shown in Fig. 14, and by automatically 
recognizing the image number 91' the step 144. In the abc 
embodiment, the secondary data were displayed depending 
upon the kinds. However, the system may be so designed 
that two or more kinds of secondary data are displayed 
simultaneously. 
15 According to the preseat invention as will 

be understood from the foregoing description, the 
user stores the notes written onto the document, 
as well as principal words and sentences in the 
document specified by a mark such ss an underline, 
in the form of secondary data and memo data that 
are to be retrieved- Therefore, even when the user 
does not remember the correct name of the document 
or the keyword or even when he remembers them vaguely, 
the desired document can be efficiently retrieved 
25 with reference to the secondary data file. By 
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Utilizing the secondary data, a plurality of documents 
can be displayed at one time in a compact form, 
making it possible to greatly reduce the time for 
retrieval compared with the method by which the 
5 contents of the whole documents are successively 
displayed and retrieved one by one from the file 
of the original documents. Moreover, each user 
is allowed to possess the secondary data file as 
his own file and is, hence, permitted to store and 
10 utilize his own memo data irrespective of other 

users. In the above embodiment, although the document 
file 5 is made up at user side through the first process, 
an existing document file supplied bya publishing firm 
may be available to produce the original paper 14. 
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CLAIMS: 

1. • A method of storing document data in 
a filing system comprising: 

a first steip for preparing a processed image whifch 
includes the content of an original document in a first 
memory region of said filing system and a secondary data; 

a second step for comparing the processed image 
with the image of the original documant read from said 
first memory region, in order to find different portions; 

a third step for classifying the different 
portions according to a predetermined classification 
standard; 

a fourth step for specifying at least one local 
region which has a predetermined relationship in 
position, that is determined by the classification 
standard, with respect to the different portion 
in the processed image; and 

a fifth step for stofifcng the data that represents 
said local region in a secondary memory region of 
said filing system together with a code that makes 
said data corresponded to said original document. 
2. A method of * storing document data in 
a filing system according to claim 1, wherein the 
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data that represents said local region contains 
position coordinates of the local region. 

3. A method of storing document data in 

a filing system according to claim 1, wherein the 
data that represents said local region is an image 
data contained in said local region. 

4. A method of storing document data in 

a filing system according to claim 1, further comprising: 

a step for discriminating characters contained 
in said local region to convert them; into character 
codes which will be stored in the second memory 
region in said fifth step. 

5- A method of storing document data in 
a filing system according to claim 1, wherein the 
data that represents the local region is stored 
in said second memory region for each of said classi- 
fications in said fifth step. 
6o A method of storing document data in 
a filing system according to claim 1, wherein said 
secondary data includes a mark that is Jotted down 
onto the document to designate said looal region. 
7. A method of storing document data in 
a filing system according to claim 6, wherein said 
mark consists of a marking line for designating 
a local region that includes a plurality of characters 
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in the processed document. 

8- A method of storing document Idata in 
a filing system according to claim 6*, wherein said 
mark consists of a frame that surrounds a plurality 
of characters in the processed document. 

9. A method of storing document ^ata in 

a filing system according to claim 1, wherein said 
secondary data contains a word that consists of 
a plurality of characters. 

10. A method of storing document data in 

a filing system according to claim 9, wherein said ^ 
secondary data contains a mark that is jotted down 
onto the document to designate said local region. 

11. A method of storing document data in 

a filing system according to' claim 1, further com- 
prising: 

a step for preparing a document to write said 
secondary data in a form in which at least one indica- 
tion to indicate a reference position is added at 
a predetermined position of an image read from said 
first memory region; and 

a step for determining and correcting the skew 
of a processed image that contains the indication 
of said reference position, relying upon the indica- 
tion of said reference position in said image; 
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wherein the image after corrected is compared 
with the original document image in said second 
step. 

12. A method of retrieving a desired document image 
out of document images stored in a filing apparatus, 
comprising: 

a first step for preparing a file of secondary- 
data by extracting additionally written matters 
from a processed image that consists of the contents 
of the original document image stored in said filing 
apparatus and additionally written matters, said 
file storing the secondary data together with codes 
that make the secondary data corresponded to the 
original document image, said secondary data being 
classified depending upon the kinds of said additionally 
written matters; 

a second step for designating at least one 
section of the secondary data; 

a third step for displaying the secondary data 
that corresponds to said designated section; 

a fourth step for specifying a document image 
that is to be read out from the filing apparatus 
with reference to the secondary data that is dis- 
played; and 

a fifth step for reading out the specified 
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document image from the filing apparatus. 

13. A method of retrieving a desired document image 
out of the docximent images stored in the filing 
apparatus according to claim 12, wherein said secondary 
data contains data that specifies a local region 

of the original document image, and the image of 
said local region is displayed in the fourth step. 

14. A method of retrieving a desired document image 
out of the document images stored in the filing 
apparatus according to claim 12, wherein said addi- 
tionally written matters include marks and notes 
for designating part of the regions in the original 
document image, said secondary data contains coordinate 
data for . specifying part of the regions of said 
original document image corresponding to said marks 
and contains local image that includes said notes, 

and wherein when a section corresponding to said 
mark is designated in said step, part. of the regions 
of the original document image is extracted and 
displayed relying upon said coordinate -data. 
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@ In order to utilize memo data written by hand at the time 
of retrieving the document, the original document image 
data is stored and, thereafter, a document image data with 
memo is separately input, the document image data with 
memo being comprised of the original document image onto 
which is additionally written memo data by hand. The posi- 
tion of the document image data with memo is then brought 
into alignment with the original document image data and is 
collated to extract only those memo data written by hand. 
Then the kind of memo is discriminated, the process is 
effected depending upon the kind of memo, and the memo 
data is stored in the secondary data file for retrieval. To 
retrieve the document, the memo data of a plurality of 
documents are read out from the secondary data file de- 
pending upon the kind of memo that is designated, and are 
displayed as a look-up table. If the operator designates any 
one of them, the image of the corresponding original docu- 
ment is read out from the original document image file and 
is displayed. 
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